Automatic spectrum analyzer



J. L. FLANAGAN AUTOMATIC SPECTRUM ANALYZER June 27, 1961 8 Sheets-Sheet 1 Filed Dec. 6. 1955 INVENTOR. n N55 L. FZHIVAW Ar ers' June 27, 1961 J. L. FLANAGAN 2,990,453

.lw'lommc SPECTRUM ANALYZER Filed Dec. e. 1955 s sheets-sheet a O I l r l /FREQ, EQuALlzlNG -PRE- EMPHASIS /5 NETWORK n EsPoNsE u Q/l Y) Z o RELATNE AMPLn'uoEs e, oF FORMAN-rs (BARNEY gli AND PErEEsoN DA1-n) lll 2s P/ "O/ /n y t ...l /f EQUALlzEo l RESPONSE M z: 5. 4 .i a u .a s A .5 u

lf/6. FREQUENCY (CPS) [-13/ v /50 BAND Pass Anruf-IER 30o 800 RECHNER fraz j3d BMD ANPunER EumNn'nqN 30 80o RECHNER f`40 /377 msx-ABLE CENTER Muun- PEAK vlsRnroR CLPPER f25a. ELEcTRomc swrroH y n INVENTOR. CHG-4) v dans: fuman/v 25 l L @SEGMENTED BY SPEECH o SPEECH oorPuT INPUT jm@ g `June 27, 1961 J. l.. FLANAGAN AUTOMATICv SPECTRUM ANALYZER Filed Deo. 6. 1955 8 Sheets-Sheet 5 KMC. Inv-Lam... 0. ZQMPOUJU BMM A mm. Ecm mm. v

Panz. Iumumm,

@lo QQQQM um f 999927,1961 J. L. FLANAGAN 2,990,453

AUTOMATIC SPECTRUM ANALYZER Filed Dec.6. 1955 s sheets-sheet 4 rasz- IN V EN TOR. fi/V55 FMA/1969 BY wat.

n "4 riff Jane 27, 1961 J. L. FLANAGAN AUTOMATIC SPECTRUM ANALYZER 8 Sheets-Sheet 5 Filed DeC. 6. 1955 June 27, 1961 J. L. FLANAGAN AUTOMATIC SPECTRUM ANALYZER 8 Sheets-Sheet 6 Filed Dec. 6. 1955 INVENTOR. JAMES fuman/v June 27, 1961 J. L. FLANAGAN 2,990,453

AUTOMATIC SPECTRUM ANALYZER Filed Deo. 6. 1955 8 Sheets-Sheet 7 INVENTOR. www* l. HAN/164# Arr/v KS June 27, 1961 J. FLANAGAN AUTOMATIC SPECTRUM ANALYZER Filed Deo. e. 1955 8` Sheets-Sheet 8 INVEN TOR. :ln/v5.5 z.. FLA/M60# M/Lb,

United States Patent 2,990,453 AUTOMATIC SPECTRUM ANALYZER `lames L. Flanagan, Cambridge, Mass., assigner to the United States of America as represented by the Secretary of the Air Force Filed Dec. 6, 1955, Ser. No. 551,478 101Claims. (Cl. 179-1555) (Granted under Title 35, U.S. Code (1952), sec. 266) The invention described herein may be manufactured and used by or for the United St-ates Government for governmental purposes without payment to me of any royalty thereon.

This invention relates to the transmission of speech over a communication channel highly restricted in transmission bandwidth -and capacity. The invention is pertinent to a speech bandwidth-compression system of the analysis-synthesis type, in which the speech information is coded in terms of signals representing the major vocal resonances (or formants) and the nature of the excitation of the vocal tract, both as functions of time during the production of speech. The major vocal resonances, or formants, are manifested as maxima in the frequency spectrum of the speech radiated by a talker. It is basic to the operation of the speech compression system that the frequencies of these spectral maxima (i.e. the formant frequencies) be determined by automatic analysis as the speech is uttered, and that signals representing these frequencies be transmitted to the speech synthesizer in order that the speech may be reproduced with negligible time delay.

The conventional telephone channel, which is a waveform transmission system, requires a transmission bandwidth of the order of 3000 cps. and a signa1-tonoise ratio of the order of 30 db. Speech analysis synthesis systems for bandwidth compression heretofore known have required `a total transmission bandwidth of the order of 300 cps. and a signal-to-noise ratio of about 30 db. The present invention makes possible a speech bandwidth compression system requiring a total transmission bandwidth of the order of 50 cps. and a-signal-to-noise ratio of approximately 30 db.

The invention described herein, when incorporated as a bandwidth compression system of the character indicated, operates to accept continuous speech at its -input and to deliver at its output electrical signals, varying slowly with time, whose amplitudes represent the frequencies of the first three major vocal resonances, or the first three formant frequencies of the input speech. The accomplishment of this task in the manner herein described involves, first, obtaining a gross Fourier transformation of the input speech, and by means of a set of contiguous band-pass'filters, to provide a short-time amplitude spectrum of the speech. Appropriate analyzing and sampling circuitry then examines this short-time spectrum to determine, and properly indicate, the frequencies of the first three maxima. The circuitry herein illustrated (reproduced, in substance, from the applicants Doctoral Thesis entitled A Speech Analyzer for a Fonman-Coding Compression System, submitted to Massachusetts Institute of Technology in May 1955, and published in abridged form in the 1956 Journal of the Acoustical Society of America, is representative of a successfully operated embodiment of the invention. Other characteristics and advantages of the invention will appear upon examination of the following description of the invention as illustrated in the accompanying drawings wherein:

FIG. 1 is a block diagram indicating major components ofthe speech analysis apparatus adapted to receive electrical signals corresponding to the oral intelligence to be analyzed and to convert such received signals into elect 2,990,453 Patented June 27, 1961l trical output signals of a character suitable to be handled in an analog type of speech synthesizer, in accordance with the principles of the present invention;

FIG. 2 is a block diagram of the sub-components entering into the circuitry constituting the vowel segmenter component diagrammatically shown in FIG. 1;

, FIG. 3 is a circuit diagram showing vowel segmenter equipment indicated in block form in FIG. 2;

FIG. 4 is a circuit diagram of the multivibrator and switching branch of the vowel segmenter equipment indicated in block form in FIG. 2;

FIG. 5 is a circuit diagram of the analyzing filter component shown in block form in FIG. l;

FIGS. 6 and 7 show the circuitry of some of the sequentially arranged components constituting the block diagram of FIG. 1;

FIG. 8 isa graph showing the frequency response characteristics of the driving amplifier for the analyzing filter set;

FIG. 9 is a pictorial representation of speech waveforms as they appear at the indicated points in the vowel segmenter;

FIG. 10 lis la circuit diagram indicating an alternate method for speech formant extraction.

The basic principles of the spectrum-sampling formant extractor are illustrated in FIG. l'. The task of the system is to accept continuous speech at its yinput and to yield output voltages whose magnitudes, as functions of time, represent the .frequencies of the first three speech formants. `Continuous speech is fed directly, or through the vowel segmenting apparatus, into the analyzing filter set. The vowel segmenter (designated by reference numeral 10 in FIG. 1) is designed to allow only the vowel 'sounds to enter the analyzing system 11. It is for these sounds that the formant structure is best defined. "Ihe filter set is composed ofl 36 contiguous band-pass filters, each with an associated amplifier, rectifier and smoothing network. The filter set provides, therefore, a short-time spectrum of the input speech. The outputs of the filter set 1 1 are scanned at 60 times per second by a constant y speed motor-operated sampling switch 12, to produce a having only two maxima (or formants).

time function of voltage samples representing the shorttirne spectrum of the input speech.

The time function obtained during one scan (in time period T) of the sampling switch is illustrated in FIG. l by Waveform (A) which is a waveform for a spectrum The single inegative sample shown in dash lines as part of `said sample (A) is utilized in counter-resetting units 13 and '14 for resettingand synchronizing the equipment at the end of each scan. The output samples from the switch are sent through a clamper circuit 15 to produce the stairstep waveform (-B). The waveforms (B) is dif- 'ferentiated and then peak-clipped (in circuitry 16) to render the apparatus reasonably insensitive to the level l of the input speech, yielding waveform (C). The differentiation produces a train of pulses which changes ypolarity at the times when the spectral maxima are passed: namely, t1 and t2. These pulses are allowedto trigger a binary Scaler circuit 17 which changes state each time the derivative pulses change sign; the scale; yields waveform (D). The output of the scaler is dif- -ferentiated and rectified (at unit 18) to produce individual pulses indicating the time t1 and t2 within the ysampling interval '1. These pulses are shown in (E). The next function is to separate these pulses marking the maxima into individual channels, and to convert the times t1 and t2 into voltages. Separation is accomplished by allowing the pulses (E) to .trigger a scalefof-ten counter (shown at 19) which is reset by a pulse from unit 13 at theend of each sampling period; (Le. 6,0 per sec.). The counter`19 utilizes a multi-cathode glow# and more fully described tothe output terminals shown at 5611-56,-

' 563-564 and 5638 respectively. Each channel includes obtain (G), which represents an effective'separation` of A the pulses in (E). v The separated pulses, still indicating t1 and t2 independently generate (in units 21) shortgate I'pulses of accurately controlleddur'ation, shown at (H). (The duration of the gate pulses is set to be 500 microseconds.) s 'The gates 21` are delivered to units 22, where A *Y they sample a Calibrating (or sweep) waveform (shown at (I) and derived from generator 23) as a linearly increasing`(sawtooth) voltage. This calibrating waveform is generated in synchronism with the sampling switch. The amplitude values of the calibrating wave, read at t1v andtz, are stored in the samplercircuits until the next sampling intervals. The voltages stored by the samplers' are illustrated by the waveforms (I). These -stored voltages are-made available as outputs 24. The sampler outputs may be smoothed by a low-pass network having. a cutoff frequency chosen in accordance with the 'expected bandwidth of the formant signal.y This cutoff frequency is of the order of l0 cps. 'I'he outputs 24 of the device, therefore, are smoothed voltages whose magnitudes, as functions of time, represent the -formant frcquencies. v Y A block diagram illustrating the principles of operation of the vowel segmenter 10, and the inter-connection of its components, is shown in FIG. 2. The circuit diagram for the vowel segmenter is shown in FIGS. 3 and 4.

The upper rectangle 30 of FIG; 3 shows one channel of "f thel amplier-rectier components 27,` 28 and 29 of the -vowel segmenter, the other rectangle 301 containing identical components yielding an output of opposite polarity. Opposite polarity outputs are taken, therefore, from the band-pass and the band-pass elimination channels 31 and 7 32,V respectively. These outputs are smoothed with RC networks 33 and 34, anddilferenced in a resistive summing network 35. The dilerence voltage is applied to cathode follower triode 36 and then passes through center and peak-clippers 37 and 38. The center and peak clippers are arranged to have an output voltagefvsfinput voltage characteristic that is essentially a step function, +1.5 volts in magnitude, occurring at an input voltage of +15 volts. 'Ihe clipped difference voltage is amplified in amplifier 39, and then passed to input terminal A or B of FIG. 4, to switch the bistable multivibrator 40 of the electronic switch circuit, shown in FIG. 4. Termif nals A and B of said multivibrator are also indicated in FIG. 3.

The output of the .bistableY multivibrator circuit is delivered to lines 41 and 42 (FIG. 4) and is passed through a symmetrical peak clipper 43, 44 with variable build-up Aand decay time control. The clipped multivibrator output is then delivered to lines 46 and 47, and allowed to gate a set of `four balanced amplifier stages, constituted by twin triodes 48 and 49. The four balanced ampli- -iier stages, having two push-pull inputs and a single balanced output, act as a single pole, double throw switch. The speech to be segmented passes through this switch and is amplified in twin triode 50, whose plate circuit supplies the output to terminals 52 and 53 of transformer -51, which terminals may also serve; as` the input terminals (see FIG. 5) of the analyzer lter set. Connection of a speech input to terminals 25(b) yieldsthe segmented yvowel sounds at the output terminals 52 and 53, while Vconnection to the inputV terminals 25b yields the segmented consonants at the output terminals 52 and 53.

-. '.lirom the vowel segmenter the signal energy passes Vto thefanalyzing iilter set, consistingv of 3.6 contiguous bandp'as's llters, all branching from the above-described single "eelt -of input terminals, as indicated at 52 and`53 in FIG'. Spandhavng individual foutputfterminals 'corr components corresponding to those indicated at 61, 62, 63, 64 and 65 as the components of channel No. 1. These components include a singletuned circuit 61, a feedback amplifier A62, 63, a transformer64 and a fullwave rectiiier 65. Either a positive or negative polarity output is available. For the present arrangement of the equipment the positive output from the filter setis used.

The posi-tive D.C. outputs of the iilter channels are carn'ed over leads 66 (FIG. l) lto the circumferentially spaced points 67 of the sampler 12, for successive transfer of signals to the rotating brush 68 whose pivoting end serves as a point of connection for the terminal of the output lead 69 which delivers the successively commutated signals to clamper 15,'as shown vin FIG. 6, includes input terminals 70, a lead 71 to the righthand grid of the iirst of two twin triode amplifiers 72, 73, and a cathode Yfollower `output circuit which divides into two branches 74 and '75, the former supplying the signal to a dilferentiator-clipper network 78-79, thence through amplifiers 76 and 77, in sequence, to the cathode-follower output terminals 82. Branch 75 supplies excitation voltage to the grid of triode 84 inthe primary circuit of transformer 85. The secondary circuit of transformers 85 includes the rectjer 86 and the grid circuit of dual amplier 87, the plate cir-l cuit 88 of said dual ampliiier 87 being supplied from the positive source of high voltage indicated at 89. The output of ampliiier 87 is delivered by way of said conductor 88,to amplifier 90 and the amplied output is divided between branches 91 and 92. The circuitry 84 through 90 generates the sawtooth Calibrating Vwaveform to Vbe used -in the sampler circuits 22 (FIG. l). The sawtooth calibrating waves for the F1 and for the F2 and F3 samplers are delivered to output terminals 99 and 100 by way of ampliiier stages 95 and 96 on the one hand and 97 and 98 on the other. The sawtooth pulses are reshaped in the two channels 4leading to terminals 99 and 100 by the diodes 93 and 94. The sweep voltages'supplied the F1 sampler and the F2 and F3 samplers are thereby constrained to the frequency ranges normally occupied by the speech formants. Typical waveforms for rthe F1 and F2-F3 samplers as shown at terminals 99 and l100, respectively.

The output of the dilerentiatorclipper 16 as delivered to the output terminals 82, heretofore described, is supplied vto the binary sealer 17 shown in FIG. 7 as consisting of the twin triode 105, and an output conductor 106 leading to a dilferentiator-rectilier network 18 (as designated in the block diagram of FIG. l), and shown in FIG. 7 as including capacitor 107, resistor 108, `and crystal diode 109.V After such differentiation and rectilication the successive signals are delivered to twin triode 110 which coacts with multidcathode glow discharge tube 111 to perform the pulse counting and pulse separating operations heretofore referred to. At the end of each counting cycle the counting circuit is reset for the commencement of a new cycle by delivery of a resetting pulse to the final cathode element 123 of the glow discharge tube 111, each resetting pulse being delivered over line 1,39 constituting the output circuit of transformer whose alternating current output is rectified by action of diode 144. The primary circuit of transformer 140 receives high voltage energization from the indicated 300 v.V source, the action being controlled by triode 142 whose plate element is connected to the transformer primary by lead 141. Triode 142 obtains its input from conductor 75 (FIG. 6) `and is carried beyond cutoff by the negative sample pulse shown at A in FIG. l.

The outputs from 'the counter circuitry 19 are delivered by way of conductor 126, `as shown in FIG. 7 to cathode follower triode 127 from which the cathode follower output which is differentiated and rectified by the combined -actionoi capacitor 128 and crystal diode 129, after which the differentiated and rectilied output is allowed to trigger a one-shot multivibrator, shown as the twin triode 130 with its associated timing network. This results in the delivery of gate pulses over conductor 131 :to sampler tube 132 whose plate circuit includes a lead 133 which receives the sawtooth pulse train generated in the element 23 and delivered thereto by way of either of terminals 99 or 100, previously described in the discussion of FIG. 6. The voltage read and stored by sampler tube v132 is delivered to triode cathode follower 137 and the output is supplied to the output terminals 138 indicated in FIG. 7. These terminals 138 correspond to the terminals 24 of FIG. 1, Ithe circuitry being duplicated for each formant sampler F1-F2-F3.

FIG. illustrates alternate circuitry for achieving the purposes of the invention. In this FIG. 10 the elements 10 and 11 correspond to those similarly dmignated in FIG. l. The filter set 11 yields essentially the values of the short-time spectrum, F0(w), at discrete frequency points, w1, W2, wn. The spectrum F0(w) is effectively multiplied by the values of the spectra F1(w), F2(w), Fj(w), consisting of standard spectral patterns stored in the circuitry. 'I'he circuitry operates on the principle of cross-correlating the short-time spectrum of the input speech with this group of standard spectral patterns. The system periodically indicates the spectral pattern which affords the best correlation. As speech is fed into the system, the indicated network performs a computation of the correlation between each successively stored pattern and the shorttime spectrum of input speech. This computation is performed for zero delay between the correlated functions. The stored pattern affording the maximum correlation during any particular selecting intervals is Iindicated by the maximum amplitude selector circuitry indicated in block form at 104 in FIG. 10, which circuitry may involve the multicat-hode glowtransfer tube 111, and associated appropriate circuitry, or any equivalent maximum amplitude selector apparatus may be substituted. Selection of a particular stored pattern is equivalent to specifying a standard set of speech formants. Therefore, it is possible to utilize the system of FIG. l0 as -a formant tracker to attain the same ultimate result as is attained by the circuitry of FIGS. 1 to 7 inclusive.

The spectrum F0(w) is effectively multiplied by the values of the stored spectra at the corresponding discrete frequency points by appropriately setting the arms on each column of the potentiometers associated with said stored spectral patterns. The outputs from the arms of each vertical `column of potentiometers as indicated in FIG. 10, are subjected to a summing operation and the maximum amplitude selector 104 periodically indicates the column (that is, the standard pattern) which affords the best correlation. A signal representing the column thus selected may then be supplied to the speech synthesizer equiph ment, said signal effectively specifying the formant pattern of the speech entering the analyzing filter set, and said signal hence being equivalent to the outputs 24 (FIG. 1) of the system illustrated in FIGS. 1 to 7 inclusive.

What is claimed is:

1. A speech analyzing apparatus for the development of electrical energy impulses suitable for application to speech synthesizing equipment, said apparatus comprising signal receiving means for accepting signals of a frequency corresponding to the frequency of sound waves generated by selected formant `components of continuous speech, means for cyclically scanning said signals to pro duce a waveform for each scanning cycle, which waveform will include as many amplitude peaks as there are selected formant components in the speech received during said scanning cycle, means for converting said amplitude peaks to pulses whose polarity reverses on each occurrence of signal components of maximum voltage amplitude within a predetermined spectral range constituting the identifying frequency envelope for said selected formant-derived signals, and means for counting the polarity reversals.

2. A speech analyzing apparatus for the development of electrical energy impulses suitable for application to speech synthesizing equipment of the terminal-analog type, or equivalent computer mechanism, said apparatus comprising signal receiving means for accepting signals of a frequency corresponding to the frequency of sound waves generated by selected formant components of continuous speech, means for cyclically scanning said signals to produce a waveform for each scanning cycle, which waveform will include as many amplitude peaks as there are selected formant components in the speech received during said scanning cycle, means for converting said amplitude peaks to pulses Whose polarity reverses on each occurrence of signal components of maximum voltage amplitude within a predetermined spectral range constituting the identifying frequency envelope for said selected formant-derived signals, means for counting the polarity reversals, and means for obtaining an output voltage whose magnitude is proportional to the counting total achieved by said counting means in a given counting interval` 3. Speech analyzing apparatus as defined in claim lI including means associated with said counting means for separating said polarity reversals into separate circuits representative of the respective formant components, in accordance with the relative time displacements between said polarity reversals.

4. Speech analyzing apparatus as defined in claim l wherein said signal receiving means includes vowel segmenting circuitry responsive to the frequency pattern generated by the speech under analysis for interrupting the ow of energy during intervals when there are no vowel sounds in the speech under analysis.

5. Speech analyzing apparatus as defined in claim l including means for dividing the received signal energy into a plurality of channels, each having a frequency range constituting a selected fraction of the preselected total range, and means for electrically connecting each of said fractional signal channels to said signal converting means in predetermined succession.

6. Speech analyzing apparatus as defined in claim l wherein said pulse generating means includes binary sealer means receiving the entire energy content of said signal receiving means, and means for conditioning the output of said binary sealer means for application ot' said scaled energy to said counting means.

7. Speech analyzing apparatus as defined in claim 1 including means for obtaining a plurality of voltage outputs varying each from the other in accordance with the relative time displacements between said pulses.

8. Speech analyzing apparatus as defined in claim 1 including means for processing the total energy content of said signal receiving means in a series of successive stages having a time sequence corresponding to the progressively varying frequency stages constituting the total range of said envelope.

9. Speech analyzing apparatus as defined in claim 2 including means for recycling said pulse-counting means at regularly occurring time intervals.

l0. Speech analyzing apparatus as defined in claim 2 including means for progressively sampling the signal content of said signal receiving means, and means for restoring said counting means to the zero position at the termination of each cycle of operation of said sampling means.

References Cited in the file of this patent UNITED STATES PATENTS 2,098,956 Dudley Nov. 16, 19'37 2,458,227 Vermeulen et al. Jan. 4, 1949 2,629,017 Dahlbom et al Feb. 17, 1953 2,646,465 Davis et al. July 21, 1953 2,672,512 Mathes Mar. 16, 1954 2,681,385 Oliver June l5, 19'54 2,766,325 Di Toro Oct. 9, 1956 

